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A method for enhancing the expression of a selected gene in an organism while avoiding or reducing co-suppression involves the 
synthesis of a DNA which is altered in nucleotide sequence and is capable of expression of a protein, ideally identical to that of a 
protein already expressed by a DNA already present in the organism. This method ensures that sequence similarity between the two genes 
is reduced enough to eliminate the phenomenon of co-suppression, allowing the over-expression of a specific protein. The method is 
particularly suitable in plants. 
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ENHANCEMENT OF GENE EXPRESSION 

This invention relates to a method and material for enhancing gene expression 
in organisms, particularly in plants. One particular, but not exclusive, application of the 
5 invention is the enhancement of caroteniod biosynthesis in plants such as tomato 
(Lycopersicon spp.) 

In order to increase production of a protein by an organism, it is known 
practice to insert into the genome of the target organism one or more additional copies 
of the protein-encoding gene by genetic transformation. Such copies would normally 

10 be identical to a gene which is already present in the plant or, alternatively, they may be 
identical copies of a foreign gene. In theory, multiple gene copies should, on 
expression cause the organism to produce the selected protein in greater than normal 
amounts, this is referred to as "overexpression". Experiments have shown however, 
that low expression or no expression of certain genes can result when multiple copies 

15 of the gene are present. (Napoli et al 1990 and Dorlhac de Borne et al 1994). This 
phenomenon is referred to as co-suppression. It most frequently occurs when 
recombinant genes are introduced into a plant already containing a gene similar in 
nucleotide sequence. It has also been observed in endogenous plant genes and 
transposable elements. The effects of co-suppression are not always immediate and can 

20 be influenced by developmental and environmental factors in the primary transformants 
or in subsequent generations. 

The general rule is to transform plants with a DNA sequence the codon usage 
of which approximates to the codon frequency used by the plant. Experimental analysis 
has shown that introducing a second copy of a gene identical in sequence to a gene 

25 already in the plant genome can result (in some instances) with the expression of the 
transgene, endogenous gene or both genes being inactivated (co-suppression). The 
mechanisms of exactly how co-suppression occurs are unclear, however there are 
several theories incorporating both pre- and post-gene transcriptional blocks. 

As a rule the nucleotide sequence of an inserted gene is "optimised" in two 

30 respects. The codon usage of the inserted gene is modified to approximate to the 
preferred codon usage of the species into which the gene is to be inserted. Inserted 
genes may also be optimised in respect of the nucleotide usage with the aim of 
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approximating the purine to pyrimidine ratio to that commonly found in the target 
species. When genes of bacterial origin are transferred to plants, for example, it is well 
known that the nucleotide usage has to be altered to avoid highly adenylated regions, 
common in bacterial genes, which may be misread by the eukaryotic expression 
5 machinery as a polyadenylation signal specifying termination of translation, resulting in 
truncation of the polypeptide. This is all common practice and is entirely logical that an 
inserted sequence should mimic the codon and nucleotide usage of the target organism 
for optimum expression. 

An object of the present invention is to provide means by which co-suppression 

10 may be obviated or mitigated. 

According to the present invention there is provided a method of enhancing 
expression of a selected protein by an organism having a gene which produces said 
protein, comprising inserting into a genome of the said organism a DNA the nucleotide 
sequence of which is such that the RNA produced on transcription is different from but 

1 5 the protein produced on translation is the same as that expressed by the gene already 
present in the genome. 

The invention also provides a gene construct comprising in sequence a 
promoter which is operable in a target organism, a coding region encoding a protein 
and a termination signal characterised in that the nucleotide sequence of the said 

20 construct is such that the RNA produced on transcription is different from but the 
protein produced on translation is the same as that expressed by the gene already 
present in the genome. 

The inserted sequence may have a constitutive promoter or a tissue or 
developmental preferential promoter. 

25 It is preferred that the promoter used in the inserted construct be different from 

that used by the gene already present in the target genome. However, our evidence 
suggests that it may be sufficient that the region between the transcription and 
translation initiation codons, sometimes referred to as the "5' intervening region" , be 
different. In other words, the co-suppression phenomenon is probably associated with 

30 the transcription step of expression rather than the translation step: it occurs at the 
DNA or RNA levels or both. 
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The invention further provides transgenic plants having enhanced ability to 
express a selected gene and seed and propagating material derived from the said plant. 

This invention is of general applicability to the expression of genes but will be 
illustrated in one specific embodiment of our invention by a method of enhancing 
5 expression of the phytoene synthase gene which is necessary for the biosynthesis of 
carotenoids in plants, the said overexpression being achieved by the use of a modified 
transgene having a different nucleotide sequence from the endogenous sequence. 

Preferably said modified phytoene synthase gene has the sequence SEQ-ID- 

10 NO-1. 

The invention also provides a modified chloroplast targeting sequence 
comprising nucleotides 1 to 417 of SEQ-ID-NO-1 . 

In simple terms, our invention requires that protein expression be enhanced by 
inserting a gene construct which is altered, with respect to the gene already present in 
15 the genome, by maximising the dissimilarity of nucleotide usage while maintaining 
identity of the encoded protein. In other words, the concept is to express the same 
protein from genes which have different nucleotide sequences within their coding 
region and, preferably the promoter region as well. It is desirable to approximate the 
nucleotide usage (the purine to pyrimidine ratio) of the inserted gene to that of the 
20 gene already present in the genome. We also believe it to be desirable to avoid the use 
of codons in the inserted gene which are uncommon in the target organism and to 
approximate the overall codon usage to the reported codon usage for the target 
genome. 

The degree to which a sequence may be modified depends on the frequency of 
25 degenerate codons. In some instances a high proportion of changes may be made, 
particularly to the third nucleotide of a triplet, resulting in a low DNA (and 
consequently RNA) sequence homology between the inserted gene and the gene 
already present while in other cases, because of the presence of unique codons, the 
number of changes which are available may be low. The number of changes which are 
30 available can be determined readily by a study of the sequence of the gene which is 
already present in its degeneracy. 
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To obtain the gene for insertion in accordance with this invention it may be 
necessary to synthesise it. The general parameters within which the nucleotide 
sequence of the synthetic gene compared with the gene already present may be 
selected are: 

5 1 . Minimise the nucleotide sequence similarity between the synthetic gene and the 
gene already present in the plant genome; 

2. Maintain the identity of the protein encoded by the coding region; 

3. Maintain approximately the optimum codon usage indicated for the target 
genome; 

10 4. Maintain approximately the same ratio of purine to pyrimidine bases; and 
5. Change the promoter or, at least, the 5* -intervening region. 

We have worked with the phytoene synthase gene of tomato. The DNA 
sequence of the endogenous phytoene sequence is known (EMBL Accession Number 
Y00521): and it was discovered that this gene contained two sequencing errors toward 

15 the 3' end. These errors were corrected in the following way ( 1 ) cancel the cytosine at 
location 1365 and (2) insert a cytosine at 1421. The corrected phytoene synthase 
sequence (Bartley et al 1992), is given herein as SEQ-ID-NO-2. Beginning with that 
natural sequence we selected modifications according to the parameters quoted above 
and synthesised the modified gene which we designated MTOM5 and which has the 

20 sequence SEQ-ID-NO-1. Figure 1 herewith shows an alignment of the natural and 

synthesised gene with retained nucleotides indicated by dots and alterations by dashes. 
The modified gene MTOM5 has 63% homology at the DNA level, 100% at the protein 
level and the proportion of adenine plus thymidine (i.e. the purines) is 54% in the 
modified gene compared with 58% for the natural sequence. 

25 In the sequence listings provided herewith, SEQ ID NO 1 is the DNA sequence 

of the synthetic (modified TOM5) gene rewferred to as MTOM5 in Figure 1, SEQ ID 
NO 2 is the natural genomic phytoene synthase (Psyl) gene referred to as GTOM5 in 
Figure 1, and SEQ-ID NO 3 is the translation product of both GTOM5 and MTOM5. 
In tomato (Lycopersicon esculentum), it has been shown that the carotenoid 

30 namely lycopene, is primarily responsible for the red colouration of developing fruit 
(Bird et al 1991). The production of an enzyme phytoene synthase, referred to herein 
as Psyl, is an important catalyst in the production of phytoene, a precursor of 
lycopene. 
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Psyl catalyses the conversion of geranyl geranyl diphosphate to phytoene, the 
first dedicated step in carotenoid biosynthesis. 

The regulation and expression of the active Psyl gene is necessary for the 
production of lycopene and consequently the red colouration of fruit during ripening. 
5 This can be illustrated by the yellow flesh phenotype of tomato fruits observed in a 
naturally occurring mutant in which the Psyl gene is inactive. In addition transgenic 
plants containing an antisense Psyl transgene, which specifically down regulates Psyl 
expression have also produced the yellow flesh phenotype of the ripe fruit. 

When transgenic plants expressing another copy of the Psyl gene (referred to 
10 as TOM5) placed under the control of a constitutive promoter (being the Cauliflower 
Mosaic Virus 35S promoter) were produced, approximately 30% of the primary 
transformants produced mature yellow fruit indicative of the phenomenon of co- 
suppression. Although some of the primary transformants produced an increased 
caroteniod content, subsequent generations did not exhibit this phenotype thus 
15 providing evidence that co-suppression is not always immediate and can occur in 
future generations. 

The sequence of Psyl is known and hence the amino acid sequence was 
determined. 

With reference to published Genbank genetic sequence data (Ken-nosuke Wada 
20 et al 1992.), a synthetic DNA was produced by altering the nucleotide sequence to one 
which still had a reasonable frequency of codon use in tomato, and which retained the 
amino acid sequence of Psyl. A simple swap between codons was used in cases where 
there are only two codon options, however in other cases the codons were changed 
within the codon usage bias of tomato. Nucleotide sequence analysis indicated that the 
25 synthetic DNA has a nucleotide similarity with Psyl (TOM5 Bartley et al 1992) of 
63% and amino acid sequence similarity of 100%. 

The synthetic gene was then cloned into plant transformation vectors under the 
control of 35S promoter. These were then transferred into tomato plants by 
Agrobacterium transformation, and both the endogenous and the synthetic gene appear 
30 to express the protein. Analysis of the primary transformants illustrates there is no 

evidence, such as the production of yellow fruit, indicative of co-suppression between 
the two genes. 



WO 97/46690 



-6- 



PCT/GB97/01414 



The present invention will now be described by way of illustration in the 
following examples. 

EXAMPLE 1 

The coding region of the cDNA which encodes tomato phytoene synthase, 
5 TOM5 (EMBL accession number Y0052 1 ) was modified since the original sequence 
contained two errors towards the 3' end of the sequence. The sequence reported by 
Bartley et al 1992 (J Biol Chem 267:5036-5039) for TOM5 cDNA homologues 
therefore differs from TOM5 (EMBL accession number Y00521). For the purpose of 
the production of the synthetic gene the sequence used is a corrected version of the 
10 TOMS cDNA which is identical to Psyl (Bartley et al 1992). 
Design of the sequence. 

1 . Potential restriction endonuclease cleavage sites were considered given the 
constraints of the amino acid sequence. Useful sites around the predicted target 
sequence cleavage site were introduced to aid subsequent manipulation of the 

1 5 leader. 

2. A simple swap between codons was used in cases where there are only two 
codon options (eg. lysine). In other cases codons were changed within the 
codon usage bias of tomato as given by Ken-nosuke Wada et al (codon usage 
tabulated from GenBank genetic sequence data, 1992. Nucleic Acids research 

20 20:S21 1 1-21 18). A priority was given to reducing homology and avoiding 

uncommon codons rather than producing a representative spread of codon 
usage. 

3. A BamHI site was introduced at either end of the sequence to facilitate cloning 
into the initial. At the 5' end 4 A were placed upstream of the ATG according 

25 the dicot start site consensus sequence (Cavener and Ray 1 99 1 , Eukaryotic 

start and stop translation sites. NAR 19: 3185-3192). 

4. The synthetic gene has been cloned into the vector pGEM4Z such that it can be 
translated using SP6. 

5. Restriction site, stemloop and codon usage analyses were performed, all results 
30 being satisfactory. 

6. The modified TOMS sequence was termed CGS48 or MTOM5. 
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Sequence analysis 

CGS48 AT content = 54% 

TOMS AT content = 58% 

The nucleotide homology between TOMS and CGS48 is 63%. 
5 Amino acid sequence homology is 100%. 

In summary the sequence TOM5 (Acc. No. Y00521) was extracted from the 
GenBank database and modified to incorporate the following corrections: deleted C at 
1365, inserted C at 1421. CGS48 is based on the CDS of the modified Y0O521 and the 
original sequence, whilst retaining translation product homology and trying to maintain 
10 optimal tomato codon usage. 
Assembly of CGS48 
CGS48 was divided into three parts: 
CGS48A: BamHI / Kpnl 
CGS48B: Kpnl / Sad 
1 5 CGS48C: Sad / BamHI 

All three were designed to be cloned on EcoRI / HindlH fragments. The 
sequences were divided into oligonucleotide fragments following computer analysis to 
give unique complementarity in the overlapping regions used for the gene assembly. 

The oligonucleotides were synthesised on an Applied Biosystems 380B DNA 
20 synthesiser using standard cyanoethyl phosphoramidite chemistry. The oligonucleotides 
were gel purified and assembled into full length fragments using our own procedures. 

The assembled fragments were cloned into pUCl 8 via their EcoRI/HindlH 
overhangs. 

Clones were sequenced bi-directionally using "forward" and "reverse" 
25 sequencing primers together with the appropriate "build" primers for the top and 
bottom strands, using the dideoxy-mediated chain termination method for plasmid 
DNA. 

Inserts from correct CGS48A, B and C clones were isolated by digestion with 
BamHI / Kpnl, Kpnl / SacI, Sad / BamHI respectively. The Kpnl and Sad ends of the 
30 BamHI / Kpnl and SacI / BamHI fragments were phosphatased. All three fragments 
were co-ligated into BamHI cut and phosphatased pGEM4Z. Clones with the correct 
sized inserts oriented with the 5' end of the insert adjacent to the Smal site were 
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identified by PCR amplification of isolated colonies and digestion of purified plasmid 
DNA with a selection of restriction enzymes. 

A CsCl purified plasmid DNA preparation was made from one of these clones. 
This clone (CGS48) was sequenced bi-directionally using "forward" and "reverse" 
5 sequencing primers together with the appropriate "build" primers for the top and 
bottom strands, using the dideoxy-mediated chain termination method for plasmid 
DNA. 

EXAMPLE 2 

Construction of the MTOM 5 vector with the CaMV 35S promoter 

10 The fragment MTOM5 (CGS48) DNA described in EXAMPLE 1 was cloned 

into the vector pJRIRi (Figure 2) to give the clone pRD13 (Figure 3). The clone 
CGS48 was digested with Smal and Xbal and then cloned into pJRIRi which was cut 
with Smal and Xbal to produce the clone pRD13 . 

EXAMPLE 3 

15 Generation and analysis of plants transformed with the vector pRD13 

The pRD13 vector was transferred to Agrobacterium tumefaciens LBA4404 (a 
micro-organism widely available to plant biotechnologists) and used to transform 
tomato plants. Transformation of tomato stem segments followed standard protocols 
(e.g. Bird et al Plant Molecular Biology 1 1, 651-662, 1988). Transformed plants were 

20 identified by their ability to grow on media containing the antibiotic kanamycin. Forty 
nine individual plants were regenerated and grown to maturity. None of these plants 
produced fruit which changed colour to yellow rather than red when ripening. The 
presence of the pRD13 construct in all of the plants was confirmed by polymerase 
chain reaction analysis. DNA blot analysis on all plants indicated that the insert copy 

25 number was between one and seven. Northern blot analysis on fruit from one plant 

indicated that the MTOM5 gene was expressed. Six transformed plants were selfed to 
produce progeny. None of the progeny plants produced fruit which changed colour to 
yellow rather than red during ripening. 

The results are summarised in Table 1 below. The incidence of yellow, or 

30 mixed yellow/red (for example, striped) fruits is indicative of suppression of phytoene 
synthesis. Thus, with the normal GTOM5 construct, 28% of the transgenic plants 
displayed the co-suppressed phenotype. All the plants carrying the modified MTOM5 
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construct of this invention had red fruit demonstrating that no suppression of phytoene 
synthesis had occurred in any of them. 

TABLE 1 





Construct 




35S-GTOM5-nos 


35S-MTOM5-nos 


Total number of fruiting plants 


39 


49 


Number of plants producing yellow fruit 


8 


0 


Number of plants producing mixed yellow and 
red fruit or temporal changes 


3 


0 


Number of plants producing red fruit 


28 


49 


% plants showing co-suppression of psyl 


28% 


0% 



Sequence Alignment of Modified TOM5 
with the synthetic MTOM5 

TOM5 ATG TCT GTT GCC TTG TTA TGG GTT GTT TCT 30 

MTOM5 ATG AGC GTG GCA CTT CTT TGG GTG GTG AGC 30 

MSVALLWVVS 

TOM5 CCT TGT GAC GTC TCA AAT GGG ACA AGT TTC 60 

MTOM5 CCA TGC GAT GTG AGT AAC GGC ACT TCA TTT 60 

PCDVSNGTSF 

TOM5 ATG GAA TCA GTC CGG GAG GGA AAC CGT TTT 90 

MTOM5 ATG GAG AGT GTG AGA GAA GGT AAT AGA TTC 90 

MESVR EGNRF 

TOM5 TTT GAT TCA TCG AGG CAT AGG AAT TTG GTG 120 

MTOM5 TTC GAC AGT TCT CGT CAC CGT AAC CTT GTT 120 

FDS SRHRNLV 

TOM5 TCC AAT GAG AGA ATC AAT AGA GGT GGT GGA 150 

MTOM5 AGT AAC GAA CGT ATA AAC AGG GGA GGA GGT 150 

SNERINRGGG 

TOM5 AAG CAA ACT AAT AAT GGA CGG AAA TTT TCT 180 

MTOM5 AAA CAG ACA AAC AAC GGT AGA AAG TTC TCA 180 
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KQTNNGRKFS 

TOM5 GTA CGG TCT GCT ATT TTG GCT ACT CCA TCT 210 

5 MTOM5 GTT AGA TCA GCA ATC CTT GCA ACA CCT AGC 210 

VRSAILATPS 

TOM5 GGA GAA CGG ACG ATG ACA TCG GAA CAG ATG 240 

10 MTOM5 GGT GAG AGA ACT ATG ACT AGC GAG CAA ATG 240 

GERTMTSEQM 

TOM 5 GTC TAT GAT GTG GTT TTG AGG CAG GCA GCC 270 



15 MTOM5 GTG TAC GAC GTC GTA CTT CGT CAA GCT GCA 270 

VYDVVLRQAA 

TOM 5 TTG GTG AAG AGG CAA CTG AGA TCT ACC AAT 300 



20 MTOM5 CTA GTT AAA CGT CAG TTA CGT AGT ACT AAC 300 

LVKRQIiRSTN 

TOM5 GAG TTA GAA GTG AAG CCG GAT ATA CCT ATT 330 

25 MTOM5 GAA CTT GAG GTT AAA CCT GAC ATT CCA ATA 330 

ELEVKPDIPI 

TOMS CCG GGG AAT TTG GGC TTG TTG AGT GAA GCA 360 

30 MTOM5 CCT GGA AAC CTT GGA CTT CTT TCT GAG GCT 360 

PGNLGIiLSEA 

TOM5 TAT GAT AGG TGT GGT GAA GTA TGT GCA GAG 39 0 

35 MTOM5 TAC GAC AGA TGC GGA GAG GTT TGC GCA GAA 390 

YDRCGEVCAE 

TOM5 TAT GCA AAG ACG TTT AAC TTA GGA ACT ATG 420 

40 MTOM5 TAC GCT AAA ACC TTC AAT TTG GGT ACC ATG 420 

YAKTFNLGTM 

TOM5 CTA ATG ACT CCC GAG AGA AGA AGG GCT ATC 450 

45 MTOM5 TTG ATG ACA CCA GAA AGG CGT CGT GCA ATA 450 

LMTPERRRA X 

TOM5 TGG GCA ATA TAT GTA TGG TGC AGA AGA ACA 480 

50 MTOM5 TGG GCT ATT TAC GTT TGG TGT AGG CGT ACT 480 

WAIYVWCRR T 

TOM 5 GAT GAA CTT GTT GAT GGC CCA AAC GCA TCA 510 

55 MTOM5 GAC GAG TTA GTG GAC GGA CCT AAT GCT AGT 510 



TOM 5 



TAT ATT ACC CCG GCA GCC TTA GAT AGG TGG 540 



WO 97/46690 



- 11 - 



PCT7GB97/01414 





MTOM5 


TAC 
Y 


ATA 


ACA 


CCC 


GCT 


GCT 


CTT 


GAC 


AGA 


TGG 


540 


5 


TOM5 


GAA 


AAT 


AGG 


CTA 


GAA 


GAT 


GTT 


TTC 


AAT 


GGG 


570 




MTOM5 


GAG 


AAC 


CGT 


TTG 


GAG 


GAC 


GTG 


TTT 


AAC 


GGC 


570 


10 


TOM5 


CGG 


CCA 


TTT 


GAC 


ATG 


CTC 


GAT 


GGT 


GCT 


TTG 


600 




MTOM5 


AGA 
R 


CCT 
P 


TTC 
F 


GAT 
D 


ATG 
M 


TTG 


GAC 


GGA 


GCA 


CTT 


600 


15 


TOM5 


TCC 


GAT 


ACA 


GTT 


TCT 


AAC 


TTT 


CCA 


GTT 


GAT 


630 




MTOM5 


AGT 
S 


GAC 
D 


ACT 


GTG 


AGC 


AAT 


TTC 


CCT 


GTG 


GAC 


630 


20 


TOMS 


ATT 


CAG 


CCA 


TTC 


AGA 


GAT 


ATG 


ATT 


GAA 


GGA 


660 




MTOM5 


ATC 
I 


CAA 
Q 


CCT 
P 


TTT 
F 


CGG 
R 


GAC 
D 


ATG 
M 


ATC 
X 


GAG 


GGC 


660 


25 


TOMS 


ATG 


CGT 


ATG 


GAC 


TTG 


AGA 


AAA 


TCG 


AGA 


TAC 


690 




MTOM5 


ATG 
M 


AGA 
R 


ATG 


GAT 


CTT 


CGT 


AAG 


TCT 


CGT 


TAT 


690 


30 


TOM5 


AAA 


AAC 


TTC 


GAC 


GAA 


CTA 


TAC 


CTT 


TAT 


TGT 


720 




MT0M5 


AAG 
K 


AAT 
N 


TTT 
F 


GAT 


GAG 


TTG 


TAT 


TTG 


TAC 


TGC 


720 


35 


TOM 5 


TAT 


TAT 


GTT 


GCT 


GGT 


ACG 


GTT 


GGG 


TTG 


ATG 


750 




MTOM5 


TAC 


TAC 


GTG 


GCA 


GGA 


ACC 


GTG 


GGC 


CTT 


ATG 


750 


40 


TOM5 


AGT 


GTT 


CCA 


ATT 


ATG 


GGT 


ATC 


GCC 


CCT 


GAA 


780 




MTOM5 


TCA 
S 


GTG 


CCT 


ATC 


ATG 


GGA 


ATT 


GCA 


CCA 


GAG 


780 


45 


TOMS 


TCA 


AAG 


GCA 


ACA 


ACA 


GAG 


AGC 


GTA 


TAT 


AAT 


810 




MTOM5 


AGT 
S 


AAA 

K 


GCT 
A 


ACT 
T 


ACT 
T 


GAA 
E 


TCT 
S 


GTT 
V 


TAC 
Y 


ACC 
N 


810 


50 


TOM5 


GCT 


GCT 


TTG 


GCT 


CTG 


GGG 


ATC 


GCA 


AAT 


CAA 


840 




MTOM5 


GCA 
A 


GCA 
A 


CTA 
L 


GCA 
A 


TTA 
X. 


GGT 
G 


ATA 
I 


GCT 
A 


AAC 
N 


CAG 
Q 


840 


55 


TOM5 


TTA 


ACT 


AAC 


ATA 


CTC 


AGA 


GAT 


GTT 


GGA 


GAA 


870 




MTOM5 


CTT 


ACA 


AAT 


ATC 


TTG 


AGG 


GAC 


GTG 


GGT 


GAG 


870 
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L 


T 


N 


I 


L 


R 


D 


V 


G 


E 






TOM5 


GAT 


GCC 


AGA 


AGA 


GGA 


AGA 


GTC 


TAC 


TTG 


CCT 


900 


5 


MTOM5 


GAC 
D 


GCA 
A 


CGT 
R 


AGG 
R 


GGT 
G 


CGT 
R 


GTG 
V 


TAT 
Y 


CTC 
li 


CCA 


900 




TOM5 


CAA 


GAT 


GAA 


TTA 


GCA 


CAG 


GCA GGT 


CTA 


TCC 


930 


10 


MTOM5 


CAG 
Q 


GAC 
D 


GAG 
E 


CTC 
If 


GCT 
A 


CAA 
Q 


GCT 
A 


GGA 
G 


TTG 
L 


AGT 

s 


930 




TOM5 


GAT 


GAA 


GAT 


ATA 




GCT 


GGA 


AGG 


GTG 


ACC 


960 


15 


MTOM5 


GAC 
D 


GAG 
E 


GAC 
D 


ATT 
I 


TTC 
P 


GCA 
A 


GGT 
G 


CGT 
R 


GTT 
V 


ACA 
T 


960 




TOM5 


GAT 


AAA 


TGG 


AGA 


ATC 




ATG 


AAG 








20 


MTOM5 


GAC 


AAG 


TGG 


AGG 


ATT 


TTC 


ATG 


AAA 










TOM5 


ATA 


CAT 


AGO 


GCA 


AGA 


AAG 


TTC 




GAT 


GAG 


1020 


25 


MTOM5 


ATT 
I 


CAC 
H 


CGT 
R 


GCT 
A 


CGT 
R 


AAA 

K 


TTT 
F 


TTC 
F 


GAC 


GAA 


1020 




TOMS 


GCA 


GAG 


AAA 


GGC 


GTG 


ACA 


GAA 


TTG 


AGC 


TCA 


1050 


30 


MTOM5 


GCT 
A 


GAA 
E 


AAG 
K 


GGA 
G 


GTT 
V 


ACT 
T 


GAG 
E 


CTT 
It 


TCT 
g 


AGT 


1050 




TOM5 


GCT 


AGT 


AGA 


TTC 


CCT 


GTA 


TGG 


GCA 




TTG 


1080 


35 


MTOM5 


GCA 
A 


TCA 
S 


AGG 
R 


TTT 
F 


CCA 
P 


GTT 
V 


TGG 
W 


GCC 
A 


AGC 

s 


CTT 
L 


1080 




TOM 5 


GTC 


TTG 


TAC 


CGC 


AAA 


ATA 


CTA 


GAT 


GAG 


ATT 


1110 


40 


MTOM5 


GTG 
V 


CTC 

Xj 


TAT 
Y 


AGA 
R 


AAG 
K 


ATT 
I 


L 


GAC 
D 


GAA 
E 


ATC 
I 


1110 




TOM5 


GAA 


GCC 


AAT 


GAC 


TAC 


AAC 


AAC 


TTC 


ACA 


AAG 


1140 


45 


MTOM5 


GAG 
E 


GCT 
A 


AAC 
N 


GAT 
D 


TAT 
Y 


AAT 
N 


AAT 

N 


TTT 
F 


ACT 
T 


AAA 

K 


1140 




TOMS 


AGA 


GCA 


TAT 


GTG 


AGC 


AAA 


TCA 


AAG 


AAG 


TTG 


1170 


50 


MTOM5 


CGT 
R 


GCT 
A 


TAC 
Y 


GTT 
V 


TCT 
S 


AAG 
K 


AGC 
S 


AAA 

K 


AAA 

K 


CTT 
L 


1170 




TOM5 


ATT 


GCA 


TTA 


CCT 


ATT 


GCA 


TAT 


GCA 


AAA 


TCT 


1200 


55 


MTOM5 


ATC 
I 


GCT 
A 


CTT 

L 


CCA 
P 


ATC 
I 


GCT 
A 


TAC 
Y 


GCT 
A 


AAG 

K 


AGC 
S 


1200 



TOM5 



CTT GTG CCT CCT ACA AAA ACT GCC TCT CTT 1230 
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TTG GTT CCA CCA ACT AAG ACA GCT AGC TTG 123 0 



T0M5 CAA AGA TAA 1239 



MTOM5 CAG AGG TGA 1239 

Q R * 



-= Different Base 

DNA SEQUENCE: 62% HOMOLOGY 

PROTEIN SEQUENCE: 100% HOMOLOGY 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION : 

(i) APPLICANT: ZENECA LIMITED .... 
(ii) TITLE OF INVENTION: ENHANCEMENT OF GENE EXPRESSION 
(iii) NUMBER OF SEQUENCES: 3 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: IP DEPT., ZENECA AGROCHEMICALS , 

(B) STREET: JEALOTTS MILL RESEARCH STATION, 

(C) CITY: BRACKNELL. 

(D) STATE: BERKSHIRE 

(E) COUNTRY: GB 
<F) ZIP: RG42 6ET 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 

(Vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: WO NOT KNOWN 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(Viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: HUSKISSON, FRANK M 

(C) REFERENCE /DOCKET NUMBER: PPD 50156/WO 

<iX) TELECOMMUNICATION INFORMATION: 
<A) TELEPHONE: 01344 414822 



<2) INFORMATION FOR SEQ ID NO : 1 : 

40 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1239 base pairs 
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WO 97/46690 - 15 - PCT/GB97/01414 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOLATE: SYNTHETIC DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

ATGAGCGTGG CACTTCTTTG GGTGGTGAGC CCATGCGATG TGAGTAACGG CACTTCATTT 
60 



ATGGAGAGTG TGAGAGAAGG TAATAGATTC TTCGACAGTT CTCGTCACCG TAACCTTGTT 
120 

AGTAACGAAC GTATAAACAG GGGAGGAGGT AAACAGACAA ACAACGGTAG AAAGTTCTCA 
180 



GTTAGATCAG CAATCCTTGC AACACCTAGC GGTGAGAGAA CTATGACTAG CGAGCAAATG 
240 

GTGTACGACG TCGTACTTCG TCAAGCTGCA CTAGTTAAAC GTCAGTTACG TAGTACTAAC 
300 

GAACTTGAGG TTAAACCTGA CATTCCAATA CCTGGAAACC TTGGACTTCT TTCTGAGGCT 
360 



TACGACAGAT GCGGAGAGGT TTGCGCAGAA TACGCTAAAA CCTTCAATTT GGGTACCATG 
420 

TTGATGACAC CAGAAAGGCG TCGTGCAATA TGGGCTATTT ACGTTTGGTG TAGGCGTACT 
480 



GACGAGTTAG TGGACGGACC TAATGCTAGT TACATAACAC CCGCTGCTCT TGACAGATGG 
540 



GAGAACCGTT TGGAGGACGT GTTTAACGGC AGACCTTTCG ATATGTTGGA CGGAGCACTT 
600 
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AGTGACACTG TGAGCAATTT CCCTGTGGAC ATCCAACCTT TTCGGGACAT GATCGAGGGC 
660 



ATGAGAATGG ATCTTCGTAA GTCTCGTTAT AAGAATTTTG ATGAGTTGTA TTTGTACTGC 
720 



TACTACGTGG CAGGAACCGT GGGCCTTATG TCAGTGCCTA TCATGGGAAT TGCACCAGAG 
780 



AGTAAAGCTA CTACTGAATC TGTTTACACC GCAGCACTAG CATTAGGTAT AGCTAAC C AG 



CTTACAAATA TCTTGAGGGA CGTGGGTGAG GACGCACGTA GGGGTCGTGT GTATCTCCCA 
900 



CAGGACGAGC TCGCTCAAGC TGGATTGAGT GACGAGGACA TTTTCGCAGG TCGTGTTACA 
960 



GACAAGTGGA GGATTTTCAT GAAAAAGCAG ATTCACCGTG CTCGTAAATT TTTCGACGAA 
1020 



GCTGAAAAGG GAGTTACTGA GCTTTCTAGT GCATCAAGGT TTCCAGTTTG GGCCAGCCTT 
1080 

GTGCTCTATA GAAAGATTTT GGACGAAATC GAGGCTAACG ATTATAATAA TTTTACTAAA 
1140 

CGTGCTTACG TTTCTAAGAG CAAAAAACTT ATCGCTCTTC CAATCGCTTA CGCTAAGAGC 
1200 

TTGGTTCCAC CAACTAAGAC AGCTAGCTTG CAGAGGTGA 
1239 

(2) INFORMATION FOR SEQ ID NO:2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1239 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(Vi) ORIGINAL SOURCE: 

(A) ORGANISM: LYOPERSICON ESCULENTUM (TOMATO) 

(vii) IMMEDIATE SOURCE: 

<B) CLONE: GTOM5 - PHYTOENE SYNTHASE GENE 



(xi> SEQUENCE DESCRIPTION: SEQ ID NO:2: 

ATGTCTGTTG CCTTGTTATG GGTTGTTTCT CCTTGTGACG TCTCAAATGG GACAAGTTTC 
60 



ATGGAATCAG TCCGGGAGGG AAACCGTTTT TTTGATTCAT CGAGGCATAG GAATTTGGTG 
120 

TCCAATGAGA GAATCAATAG AGGTGGTGGA AAGCAAACTA ATAATGGACG GAAATTTTCT 
180 



GTACGGTCTG CTATTTTGGC TACTCCATCT GGAGAACGGA CGATGACATC GGAACAGATG 
240 



GTCTATGATG TGGTTTTGAG GCAGGCAGCC TTGGTGAAGA GGCAACTGAG ATCTACCAAT 
300 



GAGTTAGAAG TGAAGCCGGA TATACCTATT CCGGGGAATT TGGGCTTGTT GAGTGAAGCA 
360 



TATGATAGGT GTGGTGAAGT ATGTGCAGAG TATGCAAAGA CGTTTAACTT AGGAACTATG 
420 

CTAATGACTC CCGAGAGAAG AAGGGCTATC TGGGCAATAT ATGTATGGTG CAGAAGAACA 
480 



GATGAACTTG TTGATGGCCC AAACGCATCA TATATTACCC CGGCAGCCTT AGATAGGTGG 
540 



SUBSTITUTE SHEET (RULE 26) 



WO 97/46690 



- 18- 



PCT/GB97/01414 



GAAAATAGGC TAGAAGATGT TTTCAATGGG CGGCCATTTG ACATGCTCGA TGGTGCTTTG 
600 

TCCGATACAG TTTCTAACTT TCCAGTTGAT ATTCAGCCAT TCAGAGATAT GATTGAAGGA 
5 660 

ATGCGTATGG ACTTGAGAAA ATCGAGATAC AAAAACTTCG ACGAACTATA CCTTTATTGT 
720 

10 TATTATGTTG CTGGTACGGT TGGGTTGATG AGTGTTCCAA TTATGGGTAT CGCCCCTGAA 
780 

TCAAAGGCAA CAACAGAGAG CGTATATAAT GCTGCTTTGG CTCTGGGGAT CGCAAATCAA 
840 

15 

TTAACTAACA TACTCAGAGA TGTTGGAGAA GATGCCAGAA GAGGAAGAGT CTACTTGCCT 
900 

CAAGATGAAT TAGCACAGGC AGGTCTATCC GATGAAGATA TATTTGCTGG AAGGGTGACC 
20 960 

GATAAATGGA GAATCTTTAT GAAGAAACAA ATACATAGGG CAAGAAAGTT CTTTGATGAG 
1020 

25 GCAGAGAAAG GCGTGACAGA ATTGAGCTCA GCTAGTAGAT TCCCTGTATG GGCATCTTTG 
1080 

GTCTTGTACC GCAAAATACT AGATGAGATT GAAGCCAATG ACTACAACAA CTTCACAAAG 
1140 

30 

AGAGCATATG TGAGCAAATC AAAGAAGTTG ATTGCATTAC CTATTGCATA TGCAAAATCT 
1200 

CTTGTGCCTC CTACAAAAAC TGCCTCTCTT CAAAGATAA 
35 1239 

(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 
40 (A) LENGTH: 402 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 
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(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

5 (iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: LYOPERSICONN ESCULENTUM (TOMATO) 

10 (vii) IMMEDIATE SOURCE: 

(A) LIBRARY: TRANSLATION PRODUCT OF GTOM5 AND MTOM5 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

15 

Met Ser Val Ala Leu Leu Trp Val Val Ser Pro Cys Asp Val Ser Asn 
15 10 15 

Gly Thr Ser Phe Met Glu Ser Val Arg Glu Gly Asn Arg Phe Phe Asp 
20 20 25 30 

Ser Ser Arg His Arg Asn Leu Val Ser Asn Glu Arg lie Asn Arg Gly 
35 40 45 

25 Gly Gly Lys Gin Thr Asn Asn Gly Arg Lys Phe Ser Val Arg Ser Ala 

50 55 60 

lie Leu Ala Thr Pro Ser Gly Glu Arg Thr Met Thr Ser Glu Gin Met 
65 70 75 80 

30 

Val Tyr Asp Val Val Leu Arg Gin Ala Ala Leu Val Lys Arg Gin Leu 



Arg Ser Thr Asn Glu Leu Glu Val Lys Pro Asp lie Pro He Pro Gly 
100 105 110 

Asn Leu Gly Leu Leu Ser Glu Ala Tyr Asp Arg Cys Gly Glu Val Cys 
115 120 125 

Ala Glu Tyr Ala Lys Thr Phe Asn Leu Gly Thr Met Leu Met Thr Pro 
130 135 140 
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Glu Arg Arg Arg Ala lie Trp Ala lie Tyr Val Trp Cys Arg Arg Thr 
145 150 155 160 

Asp Glu Leu Val Asp Gly Pro Asn Ala Ser Tyr lie Thr Pro Ala Ala 
165 170 175 

Leu Asp Arg Trp Glu Asn Arg Leu Glu Asp Val Phe Asn Gly Arg Pro 
180 185 190 

Phe Asp Met Leu Asp Gly Ala Leu Ser Asp Thr Val Ser Asn Phe Pro 
195 200 205 

Val Asp lie Gin Pro Phe Arg Asp Met lie Glu Gly Met Arg Met Asp 
210 215 220 

Leu Arg Lys Ser Arg Tyr Lys Asn Phe Asp Glu Leu Tyr Leu Tyr Cys 
225 230 235 240 

Tyr Tyr Val Ala Gly Thr Val Gly Leu Met Ser Val Pro lie Met Gly 
245 250 255 

lie Ala Pro Glu Ser Lys Ala Thr Thr Glu ser Val Tyr Asn Ala Ala 
260 265 270 

Leu Ala Leu Gly lie Ala Asn Gin Leu Thr Asn lie Leu Arg Asp Val 
275 280 285 

Gly Glu Asp Ala Arg Arg Gly Arg Val Tyr Leu Pro Gin Asp Glu Leu 
290 295 300 

Ala Gin Ala Gly Leu Ser Asp Glu Asp lie Phe Ala Gly Arg Val Thr 
305 310 315 320 

He His Arg Ala Arg Lys Phe Phe Asp Glu Ala Glu Lys Gly Val Thr 
325 330 335 

Glu Leu Ser Ser Ala Ser Arg Phe Pro val Trp Ala Ser Leu Val Leu 
340 345 350 

Tyr Arg Lys He Leu Asp Glu He Glu Ala Asn Asp Tyr Asn Asn Phe 
355 360 365 
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Thr Lys Arg Ala Tyr Val Ser Lys Ser Lys Lys Leu lie Ala Leu Pro 
370 375 380 

lie Ala Tyr Ala Lys Ser Leu Val Pro Pro Thr Lys Thr Ala Ser Leu 
385 390 395 400 

Gin Arg 
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CLAIMS 

1 . A method of enhancing expression of a selected protein by an organism having 
a gene which produces said protein, comprising inserting into the genome of 
5 the said organism a DNA the nucleotide sequence of which is such that the 

RNA produced on transcription is different from but the protein produced on 
translation is the same as that expressed by the gene already present in the 
genome. 

10 2. A method as claimed in claim 1 , in which the organism is a plant. 

3. A method as claimed in claim 2. in which the plant is a tomato plant. 

4. A method as claimed in any preceding claim, in which the selected gene is the 
15 gene encoding phytoene synthase. 

5. A method as claimed in claim 4, in which the coding region of the said inserted 
gene has the sequence SEQ-ID-NO-1. 

20 6. A gene construct comprising in sequence a promoter which is operable in a 
target organism, a coding region encoding a protein and a termination signal 
characterised in that the nucleotide sequence of the said construct is such that 
the RNA produced on transcription is different from but the protein produced 
on translation is the same as that expressed by the gene already present in the 

25 genome. 

7. A method of enhancing expression of caroteniods in a plant comprising 
overexpression in the plant a gene specifying an enzyme necessary to the 
biosynthesis of carotenoids, the said overexpression being achieved by the use 
30 of a modified transgene having a different nucleotide sequence from the 

endogenous sequence. 
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8. A method as claimed in claim 7, in which the modified gene specifies phytoene 
synthase. 

9. A modified chloroplast targeting sequence comprising nucleotides 1 to 417 of 
5 SEQ-ID-NO-1 
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FIGURE 3 
P RD13 

MTOM5 encodes phytoene synthase 
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